Machine Learning Pipeline

An Implementation of DonorsChoose Project

Yeol Ye
University of Chicago
ziyuye@uchicago.edu

Phase 1: Data Preparation

In [25]:
import prep
import pandas as pd
import datetime
import numpy as np
import seaborn as sns
from scipy.stats import norm, lognorm
import statsmodels.api as sm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
%matplotlib inline

from datetime import datetime, timedelta
In [26]:
import os
import sys
import warnings
sys.path.append('../code/')
warnings.filterwarnings('ignore')
In [140]:
data_file = 'projects_2012_2013.csv'
df = pd.read_csv('../data/source/projects_2012_2013.csv')
target_name = 'fully_funded'
num_list = ['school_latitude', 'school_longitude',
            'total_price_including_optional_support', 'students_reached']
In [141]:
df['date_posted'] = pd.to_datetime(df['date_posted'])
df['datefullyfunded'] = pd.to_datetime(df['datefullyfunded'])
df['fully_funded'] = (df['datefullyfunded'] - df['date_posted']) < timedelta(days=60)
df = df.astype({'school_latitude': np.float, 'school_longitude': np.float, 'fully_funded': np.float,
                'total_price_including_optional_support': np.float, 'students_reached': np.float})

target, features = prep.target_features_split(target_name, df)
cat, numeric = _num_cat_split(features, num_list)
In [149]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124976 entries, 0 to 124975
Data columns (total 27 columns):
projectid                                 124976 non-null object
teacher_acctid                            124976 non-null object
schoolid                                  124976 non-null object
school_ncesid                             115743 non-null float64
school_latitude                           124976 non-null float64
school_longitude                          124976 non-null float64
school_city                               124976 non-null object
school_state                              124976 non-null object
school_metro                              109752 non-null object
school_district                           124804 non-null object
school_county                             124976 non-null object
school_charter                            124976 non-null object
school_magnet                             124976 non-null object
teacher_prefix                            124976 non-null object
primary_focus_subject                     124961 non-null object
primary_focus_area                        124961 non-null object
secondary_focus_subject                   84420 non-null object
secondary_focus_area                      84420 non-null object
resource_type                             124959 non-null object
poverty_level                             124976 non-null object
grade_level                               124973 non-null object
total_price_including_optional_support    124976 non-null float64
students_reached                          124917 non-null float64
eligible_double_your_impact_match         124976 non-null object
date_posted                               124976 non-null datetime64[ns]
datefullyfunded                           124976 non-null datetime64[ns]
fully_funded                              124976 non-null float64
dtypes: datetime64[ns](2), float64(6), object(19)
memory usage: 25.7+ MB

Phase 2: Data Exploration

In [234]:
__df = df_cleaned[:]
__df = __df.drop('date_posted', axis=1)
__df = __df.drop('datefullyfunded', axis=1)
In [79]:
import seaborn as sns
import matplotlib
sns.set(style='whitegrid')
%matplotlib inline

import explore

%config InlineBackend.figure_format = 'retina' 

2.1 Check the Distribution for the Target

In [181]:
explore.count_plot(target_name, target, 'Count Plot for the Target')

2.2 Check the Distribution for Selected Categorical Columns of Features

In [120]:
explore.count_plot('grade_level', features, 'Count Plot for Grade Levels')
In [121]:
explore.count_plot('school_charter', features, 'Count Plot for Grade Levels')
In [122]:
explore.count_plot('school_magnet', features, 'Count Plot for Grade Levels')
In [123]:
explore.count_plot('teacher_prefix', features, 'Count Plot for Grade Levels')
In [124]:
explore.count_plot('school_metro', features, 'Count Plot for Grade Levels')
In [125]:
plt.rcParams['figure.figsize'] = (14.0, 5.0)
explore.count_plot('primary_focus_area', features, 'Count Plot for Grade Levels')
In [126]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina' 
explore.count_plot('resource_type', features, 'Count Plot for Grade Levels')
In [127]:
explore.count_plot('poverty_level', features, 'Count Plot for Grade Levels')

2.3 Check the Distribution and Outliers for Numeric Data

In [131]:
explore.box_plot(numeric)

Giving the plot above, we could find that there exists some outliers in the data, and it would be useful to filter out all these outliers and see the distribution of the data. The following function for distribution used the data between the 0.05 quantile and 0.95 quantile.

In [132]:
explore.dist_plot(numeric)

2.4 Check the Correlation Matrix of Numeric Columns

In [150]:
explore.corr_plot(df)

2.5 Drop Some Features with High Correlations

In this sample, we do not find any feature has high correlation with one another, thus we do not need to drop features this time. : )

2.6 Draw Pair Plot for Numeric Features and Check the Possibility for a Machine Learning Method

In [151]:
temp = list(features.columns)
temp.append(target_name)
sorted_data = df[temp]
explore.pair_plot(sorted_data, target_name)

From the plots, we can see that the two classes of the target value seem to be somewhat separated by some of the features, e.g. Total Price Including Optional Support.This implies the potential feasibility to use a machine learning algorithm to separate the data by label.

Note that it is also possible to transform category data into numeric (i.e dummy) data and generate the pair plot. This step is skipped here due to limited space for images. The code part of this project provides such a method.

Phase 3: Feature Engineering

3.0 General Inspection

In [155]:
import feature
In [192]:
list_to_drop = ['projectid', 'teacher_acctid', 'schoolid', 'school_ncesid', 'school_city', 'school_district',
               'school_state', 'school_county']
df_cleaned = df.drop(list_to_drop, axis=1)

3.1 Cut Outliers for Certain Features

By inspecting the distribution of numeric features, I decide not to cut outliers, as they are very few and represent meaningful information. Although this step is skipped here, you can find relavant function in the code section of the project.

3.2 Discretize Continuous Features

By inspecting the content of numeric features, I decide that they should be considered as continuous variable rather than discrete variable, thus this step is skipped here. Although this step is skipped here, you can find relavant function in the code section of the project.

3.3 Transform Categorical Features to Dummies

In [195]:
df_cleaned = feature.one_hot_encoding_all(df_cleaned)
In [196]:
df_cleaned.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 124976 entries, 0 to 124975
Columns: 102 entries, school_latitude to eligible_double_your_impact_match_t
dtypes: datetime64[ns](2), float64(5), uint8(95)
memory usage: 18.0 MB
In [201]:
null_vars = list(df_cleaned.columns[df_cleaned.isnull().any()])
null_vars
Out[201]:
['students_reached']

3.4 Fill Null Values

In [203]:
df_cleaned = fill_null(df_cleaned)
In [205]:
df_cleaned.columns
Out[205]:
Index(['school_latitude', 'school_longitude',
       'total_price_including_optional_support', 'students_reached',
       'date_posted', 'datefullyfunded', 'fully_funded', 'school_metro_rural',
       'school_metro_suburban', 'school_metro_urban',
       ...
       'poverty_level_high poverty', 'poverty_level_highest poverty',
       'poverty_level_low poverty', 'poverty_level_moderate poverty',
       'grade_level_Grades 3-5', 'grade_level_Grades 6-8',
       'grade_level_Grades 9-12', 'grade_level_Grades PreK-2',
       'eligible_double_your_impact_match_f',
       'eligible_double_your_impact_match_t'],
      dtype='object', length=102)

Phase 4: Model Training and Evaluation

In [287]:
import model
import graphviz

from sklearn import svm
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import ParameterGrid
from sklearn.preprocessing import MinMaxScaler
from sklearn.tree import export_graphviz
from sklearn.metrics import *
In [335]:
models = ['LR', 'KNN', 'DT', 'RF', 'AB', 'BAG']

clfs = {'LR': LogisticRegression(), 
        'KNN': KNeighborsClassifier(),
        'DT': DecisionTreeClassifier(), 
        'RF': RandomForestClassifier(), 
        'AB': AdaBoostClassifier(),
        'BAG': BaggingClassifier()}
    
grid = {'LR': { 'penalty': ['l1','l2'], 'C': [0.01,0.1,1,10]},
        'KNN' :{'n_neighbors': [1,5,10,25,50,100], 'weights': ['uniform', 'distance'], 
                'algorithm': ['auto','ball_tree','kd_tree']},
        'DT': {'criterion': ['gini', 'entropy'], 'max_depth': [1,5,10,20,50], 'min_samples_split': [5,25]},
        'RF':{'n_estimators': [10,100], 'max_depth': [5,25], 'max_features': ['sqrt','log2'], 
              'min_samples_split': [5,10], 'n_jobs': [-1]},
        'AB': { 'algorithm': ['SAMME', 'SAMME.R'], 'n_estimators': [1,10,100]},
        'BAG': {'n_estimators': [10,100]}}
In [329]:
train_sets, test_sets, times = model.temporal_train_test_split(df_cleaned, 'date_posted', period='6M')
train_sets, test_sets = model.drop_time_col(train_sets, test_sets, 'datefullyfunded')
results_df = clfs_loop_temporal(train_sets, test_sets, 'fully_funded', models, clfs, grid)
LR
KNN
DT
RF
AB
BAG
LR
KNN
DT
RF
AB
BAG
LR
KNN
DT
RF
AB
BAG
In [331]:
table_0 = results_df[0]
table_0
Out[331]:
model_type clf parameters auc-roc p_at_1 p_at_2 p_at_5 p_at_10 p_at_20 p_at_30 p_at_50
0 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.01, 'penalty': 'l1'} 0.615814 0.929907 0.888112 0.869403 0.842890 0.808625 0.779798 0.752914
1 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.01, 'penalty': 'l2'} 0.630750 0.892523 0.848485 0.861007 0.846620 0.817949 0.797047 0.761305
2 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.1, 'penalty': 'l1'} 0.627076 0.747664 0.776224 0.835821 0.829837 0.805828 0.795493 0.760466
3 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.1, 'penalty': 'l2'} 0.629621 0.733645 0.783217 0.829291 0.832168 0.806993 0.792696 0.761305
4 LR LogisticRegression(C=10, class_weight=None, du... {'C': 1, 'penalty': 'l1'} 0.629203 0.771028 0.755245 0.795709 0.818182 0.806527 0.793007 0.762145
5 LR LogisticRegression(C=10, class_weight=None, du... {'C': 1, 'penalty': 'l2'} 0.628710 0.742991 0.771562 0.802239 0.821445 0.807692 0.789277 0.762238
6 LR LogisticRegression(C=10, class_weight=None, du... {'C': 10, 'penalty': 'l1'} 0.628008 0.794393 0.759907 0.800373 0.819580 0.808625 0.789899 0.761399
7 LR LogisticRegression(C=10, class_weight=None, du... {'C': 10, 'penalty': 'l2'} 0.629100 0.747664 0.771562 0.805970 0.820513 0.809091 0.790054 0.762890
8 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 1, 'weigh... 0.523353 0.995327 0.997669 0.999067 0.999534 0.940793 0.960373 0.976131
9 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 1, 'weigh... 0.523353 0.995327 0.997669 0.999067 0.999534 0.940793 0.960373 0.976131
10 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 5, 'weigh... 0.555352 1.000000 0.941725 0.955224 0.576690 0.688811 0.792541 0.808392
11 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 5, 'weigh... 0.555269 1.000000 0.941725 0.953358 0.576224 0.688345 0.732556 0.713473
12 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 10, 'weig... 0.565838 1.000000 1.000000 0.628731 0.814452 0.671562 0.736131 0.734266
13 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 10, 'weig... 0.565271 1.000000 1.000000 0.625933 0.784615 0.761538 0.740016 0.717855
14 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 25, 'weig... 0.580995 0.822430 0.727273 0.854478 0.807459 0.797902 0.763636 0.737529
15 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 25, 'weig... 0.578846 0.785047 0.797203 0.808769 0.798601 0.779720 0.755866 0.725128
16 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 50, 'weig... 0.588237 0.803738 0.790210 0.844216 0.758974 0.753147 0.751671 0.733986
17 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 50, 'weig... 0.586580 0.827103 0.836830 0.817164 0.800466 0.788578 0.761150 0.731189
18 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 100, 'wei... 0.590899 0.855140 0.825175 0.835821 0.796270 0.801399 0.757731 0.745455
19 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 100, 'wei... 0.590466 0.873832 0.834499 0.824627 0.815851 0.801865 0.764880 0.735012
20 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 1, '... 0.523353 0.995327 0.997669 0.999067 0.999534 0.940793 0.960373 0.976131
21 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 1, '... 0.523353 0.995327 0.997669 0.999067 0.999534 0.940793 0.960373 0.976131
22 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 5, '... 0.555286 1.000000 0.941725 0.955224 0.576690 0.688811 0.792541 0.808298
23 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 5, '... 0.555223 1.000000 0.941725 0.953358 0.576224 0.688345 0.732556 0.713473
24 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 10, ... 0.565819 1.000000 1.000000 0.628731 0.814452 0.671562 0.736131 0.734266
25 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 10, ... 0.565255 1.000000 1.000000 0.625933 0.784615 0.761538 0.740016 0.717855
26 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 25, ... 0.581002 0.822430 0.727273 0.854478 0.807459 0.798368 0.763326 0.737436
27 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 25, ... 0.578850 0.785047 0.797203 0.808769 0.798601 0.779953 0.756022 0.725128
28 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 50, ... 0.588245 0.803738 0.790210 0.844216 0.758974 0.752914 0.751671 0.734079
29 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 50, ... 0.586588 0.827103 0.836830 0.817164 0.800000 0.788578 0.761150 0.731189
... ... ... ... ... ... ... ... ... ... ... ...
58 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 10, 'min... 0.595308 1.000000 1.000000 1.000000 0.620979 0.729138 0.787879 0.740606
59 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 10, 'min... 0.598693 1.000000 1.000000 1.000000 0.638695 0.777622 0.792541 0.734918
60 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 20, 'min... 0.568769 0.995327 0.997669 0.999067 0.999534 0.687413 0.722455 0.833380
61 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 20, 'min... 0.584218 1.000000 1.000000 1.000000 0.979021 0.527273 0.684848 0.741445
62 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 50, 'min... 0.557523 0.995327 0.997669 0.999067 0.999534 0.889510 0.926185 0.955618
63 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 50, 'min... 0.580076 1.000000 1.000000 0.940299 0.970163 0.515152 0.636519 0.740699
64 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.597470 0.934579 0.899767 0.848881 0.800000 0.770629 0.764258 0.748345
65 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.607809 0.799065 0.832168 0.828358 0.803263 0.770629 0.761461 0.751608
66 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.594054 0.742991 0.750583 0.798507 0.780420 0.770862 0.758819 0.741725
67 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.594048 0.757009 0.787879 0.814366 0.795338 0.765501 0.749961 0.734825
68 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.585685 0.845794 0.829837 0.798507 0.770163 0.763636 0.752292 0.732028
69 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.590310 0.691589 0.731935 0.776119 0.785082 0.761305 0.746387 0.737156
70 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.571782 0.682243 0.708625 0.718284 0.739860 0.749184 0.739083 0.726713
71 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.589309 0.752336 0.787879 0.805037 0.783217 0.766667 0.756643 0.731655
72 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.599096 0.813084 0.825175 0.810634 0.788345 0.772261 0.766744 0.743310
73 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.615343 0.752336 0.813520 0.798507 0.806993 0.792075 0.778399 0.750769
74 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.601104 0.766355 0.780886 0.786381 0.788811 0.778788 0.762393 0.743403
75 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.616033 0.771028 0.787879 0.810634 0.805128 0.795105 0.777001 0.750676
76 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.586006 0.775701 0.785548 0.791978 0.775291 0.766667 0.751981 0.733427
77 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.613449 0.766355 0.785548 0.804104 0.800932 0.788578 0.780886 0.753193
78 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.598078 0.752336 0.769231 0.768657 0.772960 0.765734 0.759596 0.740140
79 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.613414 0.785047 0.790210 0.809701 0.804196 0.786480 0.774359 0.751981
80 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 1} 0.520968 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
81 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 10} 0.582554 0.710280 0.853147 0.807836 0.842424 0.772727 0.746698 0.771282
82 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 100} 0.626284 0.761682 0.734266 0.781716 0.810256 0.835198 0.811189 0.744988
83 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 1} 0.520968 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
84 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 10} 0.621952 0.841121 0.792541 0.837687 0.801865 0.803030 0.816162 0.768205
85 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 100} 0.641809 0.766355 0.811189 0.833955 0.851748 0.834033 0.817405 0.772867
86 BAG (DecisionTreeClassifier(class_weight=None, cri... {'n_estimators': 10} 0.593860 1.000000 1.000000 0.632463 0.733333 0.698368 0.764258 0.761492
87 BAG (DecisionTreeClassifier(class_weight=None, cri... {'n_estimators': 100} 0.614320 0.887850 0.883450 0.863806 0.837296 0.790443 0.776845 0.750676

88 rows × 11 columns

In [332]:
table_1 = results_df[1]
table_1
Out[332]:
model_type clf parameters auc-roc p_at_1 p_at_2 p_at_5 p_at_10 p_at_20 p_at_30 p_at_50
0 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.01, 'penalty': 'l1'} 0.643263 0.934132 0.926756 0.925837 0.907324 0.876102 0.859805 0.824486
1 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.01, 'penalty': 'l2'} 0.647262 0.925150 0.935725 0.917464 0.900448 0.879839 0.861897 0.825502
2 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.1, 'penalty': 'l1'} 0.649900 0.913174 0.926756 0.915072 0.897160 0.880735 0.862296 0.827774
3 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.1, 'penalty': 'l2'} 0.646610 0.925150 0.914798 0.901914 0.890583 0.876551 0.858310 0.826339
4 LR LogisticRegression(C=10, class_weight=None, du... {'C': 1, 'penalty': 'l1'} 0.648186 0.883234 0.895366 0.893541 0.886398 0.877746 0.859306 0.828252
5 LR LogisticRegression(C=10, class_weight=None, du... {'C': 1, 'penalty': 'l2'} 0.647310 0.898204 0.893871 0.894737 0.885800 0.876551 0.858709 0.826937
6 LR LogisticRegression(C=10, class_weight=None, du... {'C': 10, 'penalty': 'l1'} 0.647891 0.883234 0.890882 0.891148 0.885501 0.876700 0.859107 0.828013
7 LR LogisticRegression(C=10, class_weight=None, du... {'C': 10, 'penalty': 'l2'} 0.647449 0.889222 0.893871 0.893541 0.885800 0.876252 0.858111 0.827176
8 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 1, 'weigh... 0.540009 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
9 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 1, 'weigh... 0.540009 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
10 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 5, 'weigh... 0.585809 1.000000 1.000000 0.980263 0.666368 0.833209 0.795038 0.800395
11 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 5, 'weigh... 0.586036 1.000000 1.000000 0.980263 0.666368 0.827828 0.804404 0.792025
12 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 10, 'weig... 0.601752 1.000000 0.780269 0.837919 0.883707 0.847258 0.762654 0.857604
13 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 10, 'weig... 0.602378 1.000000 0.780269 0.837919 0.860688 0.845912 0.825329 0.797645
14 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 25, 'weig... 0.625551 0.952096 0.898356 0.888756 0.872646 0.884023 0.849641 0.836502
15 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 25, 'weig... 0.624816 0.931138 0.926756 0.918062 0.897160 0.872964 0.849143 0.812530
16 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 50, 'weig... 0.636351 0.904192 0.938714 0.921053 0.901943 0.871768 0.854723 0.838654
17 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 50, 'weig... 0.635518 0.934132 0.931241 0.924641 0.920179 0.882977 0.859306 0.817671
18 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 100, 'wei... 0.643023 0.904192 0.928251 0.937201 0.914499 0.887162 0.867776 0.826100
19 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 100, 'wei... 0.642463 0.922156 0.928251 0.937201 0.922272 0.891795 0.866381 0.821676
20 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 1, '... 0.540029 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
21 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 1, '... 0.540029 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
22 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 5, '... 0.585769 1.000000 1.000000 0.980263 0.666069 0.833059 0.794938 0.800395
23 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 5, '... 0.586015 1.000000 1.000000 0.980263 0.666069 0.827679 0.804404 0.792025
24 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 10, ... 0.601687 1.000000 0.780269 0.837919 0.883707 0.847258 0.762455 0.857484
25 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 10, ... 0.602326 1.000000 0.780269 0.837919 0.860688 0.845912 0.825229 0.797645
26 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 25, ... 0.625528 0.952096 0.898356 0.888756 0.872646 0.883724 0.849641 0.836502
27 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 25, ... 0.624800 0.931138 0.926756 0.918062 0.897160 0.872814 0.849043 0.812530
28 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 50, ... 0.636342 0.904192 0.938714 0.921053 0.901943 0.871619 0.854723 0.838714
29 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 50, ... 0.635513 0.934132 0.931241 0.924641 0.920179 0.882977 0.859306 0.817671
... ... ... ... ... ... ... ... ... ... ... ...
58 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 10, 'min... 0.638625 1.000000 1.000000 0.754785 0.865770 0.876102 0.870865 0.824725
59 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 10, 'min... 0.638512 1.000000 1.000000 0.789474 0.874439 0.876252 0.869769 0.824845
60 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 20, 'min... 0.580493 1.000000 1.000000 1.000000 1.000000 0.680167 0.690016 0.798661
61 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 20, 'min... 0.597035 1.000000 1.000000 1.000000 0.780269 0.733822 0.822539 0.802427
62 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 50, 'min... 0.561997 0.997006 0.998505 0.999402 0.999701 0.986101 0.990634 0.994321
63 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 50, 'min... 0.591135 1.000000 1.000000 1.000000 0.931241 0.659094 0.772718 0.795253
64 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.647113 0.961078 0.952167 0.935407 0.911510 0.881184 0.856616 0.824845
65 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.668488 0.973054 0.959641 0.949163 0.928849 0.906292 0.877640 0.839431
66 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.664038 0.979042 0.962631 0.946770 0.920478 0.900463 0.877043 0.835067
67 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.670836 0.964072 0.970105 0.947967 0.934230 0.909729 0.882623 0.838654
68 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.660545 0.925150 0.926756 0.921053 0.910314 0.886415 0.868772 0.832317
69 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.664858 0.976048 0.962631 0.946770 0.929447 0.905844 0.877043 0.833752
70 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.644049 0.955090 0.940209 0.924043 0.900448 0.874159 0.856417 0.824546
71 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.665241 0.973054 0.968610 0.951555 0.930942 0.905993 0.876943 0.836263
72 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.635386 0.934132 0.922272 0.905502 0.886697 0.865640 0.848645 0.819225
73 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.669383 0.976048 0.955157 0.929426 0.924066 0.904499 0.881427 0.837697
74 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.643065 0.970060 0.943199 0.924641 0.903438 0.875803 0.857314 0.821676
75 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.673535 0.958084 0.958146 0.941986 0.929746 0.907787 0.882523 0.840567
76 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.633658 0.907186 0.907324 0.890550 0.885501 0.864744 0.849442 0.817492
77 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.664918 0.958084 0.947683 0.931220 0.921973 0.897624 0.877043 0.835605
78 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.643220 0.955090 0.943199 0.919258 0.900747 0.878942 0.858210 0.824067
79 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.671610 0.964072 0.940209 0.927033 0.924664 0.903751 0.879833 0.839969
80 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 1} 0.605574 1.000000 1.000000 1.000000 0.954260 0.710208 0.806796 0.884027
81 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 10} 0.655353 0.937126 0.968610 0.919258 0.904335 0.879091 0.870666 0.846126
82 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 100} 0.663087 0.889222 0.935725 0.941388 0.933632 0.906591 0.878836 0.829866
83 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 1} 0.605574 1.000000 1.000000 1.000000 0.954260 0.710208 0.806796 0.884027
84 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 10} 0.657813 0.934132 0.955157 0.930622 0.915695 0.895980 0.874552 0.836083
85 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 100} 0.670318 0.946108 0.952167 0.943780 0.924963 0.905246 0.884914 0.837697
86 BAG (DecisionTreeClassifier(class_weight=None, cri... {'n_estimators': 10} 0.623661 1.000000 1.000000 0.754187 0.877130 0.814079 0.841271 0.854615
87 BAG (DecisionTreeClassifier(class_weight=None, cri... {'n_estimators': 100} 0.653657 0.949102 0.898356 0.900120 0.907922 0.888507 0.866281 0.831540

88 rows × 11 columns

In [334]:
table_2 = results_df[2]
table_2
Out[334]:
model_type clf parameters auc-roc p_at_1 p_at_2 p_at_5 p_at_10 p_at_20 p_at_30 p_at_50
0 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.01, 'penalty': 'l1'} 0.637597 0.906977 0.888631 0.866295 0.845012 0.817169 0.797525 0.767981
1 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.01, 'penalty': 'l2'} 0.633879 0.925581 0.911833 0.872795 0.837123 0.812297 0.790101 0.763619
2 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.1, 'penalty': 'l1'} 0.638774 0.888372 0.893271 0.881151 0.838979 0.817865 0.794586 0.767239
3 LR LogisticRegression(C=10, class_weight=None, du... {'C': 0.1, 'penalty': 'l2'} 0.635897 0.902326 0.900232 0.876509 0.838515 0.816241 0.791338 0.763434
4 LR LogisticRegression(C=10, class_weight=None, du... {'C': 1, 'penalty': 'l1'} 0.638609 0.888372 0.897912 0.878366 0.840835 0.818097 0.793658 0.765568
5 LR LogisticRegression(C=10, class_weight=None, du... {'C': 1, 'penalty': 'l2'} 0.637457 0.888372 0.893271 0.877437 0.839907 0.816937 0.792885 0.766125
6 LR LogisticRegression(C=10, class_weight=None, du... {'C': 10, 'penalty': 'l1'} 0.638509 0.888372 0.900232 0.879294 0.842691 0.817401 0.793813 0.765476
7 LR LogisticRegression(C=10, class_weight=None, du... {'C': 10, 'penalty': 'l2'} 0.637509 0.888372 0.895592 0.874652 0.840835 0.816937 0.791647 0.766311
8 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 1, 'weigh... 0.540322 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
9 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 1, 'weigh... 0.540322 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
10 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 5, 'weigh... 0.588454 1.000000 1.000000 1.000000 0.616705 0.731090 0.820727 0.818840
11 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 5, 'weigh... 0.588989 1.000000 1.000000 1.000000 0.616241 0.731090 0.759938 0.735963
12 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 10, 'weig... 0.609186 1.000000 1.000000 0.711235 0.855684 0.740603 0.796597 0.780046
13 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 10, 'weig... 0.609104 1.000000 1.000000 0.709378 0.840371 0.805104 0.782367 0.747007
14 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 25, 'weig... 0.631095 0.865116 0.932715 0.879294 0.856613 0.862645 0.788554 0.797587
15 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 25, 'weig... 0.630483 0.855814 0.909513 0.893222 0.875174 0.839443 0.807579 0.762135
16 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 50, 'weig... 0.642476 0.930233 0.902552 0.916435 0.881206 0.836891 0.821964 0.774942
17 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 50, 'weig... 0.641185 0.930233 0.928074 0.912721 0.889559 0.851740 0.820727 0.766775
18 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 100, 'wei... 0.647864 0.925581 0.932715 0.912721 0.887703 0.859397 0.822738 0.778933
19 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'auto', 'n_neighbors': 100, 'wei... 0.647228 0.925581 0.928074 0.922934 0.896056 0.856845 0.829080 0.772065
20 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 1, '... 0.540356 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
21 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 1, '... 0.540356 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
22 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 5, '... 0.588438 1.000000 1.000000 1.000000 0.616705 0.731090 0.820727 0.818840
23 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 5, '... 0.588977 1.000000 1.000000 1.000000 0.616241 0.731090 0.759938 0.735963
24 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 10, ... 0.609146 1.000000 1.000000 0.711235 0.855684 0.740371 0.796597 0.780046
25 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 10, ... 0.609077 1.000000 1.000000 0.709378 0.840371 0.804872 0.782367 0.747007
26 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 25, ... 0.631039 0.865116 0.932715 0.879294 0.856613 0.862645 0.788554 0.797587
27 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 25, ... 0.630433 0.855814 0.909513 0.893222 0.875174 0.839443 0.807579 0.762135
28 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 50, ... 0.642478 0.930233 0.902552 0.916435 0.881206 0.836891 0.822119 0.774942
29 KNN KNeighborsClassifier(algorithm='kd_tree', leaf... {'algorithm': 'ball_tree', 'n_neighbors': 50, ... 0.641185 0.930233 0.928074 0.912721 0.889559 0.851740 0.820727 0.766775
... ... ... ... ... ... ... ... ... ... ... ...
58 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 10, 'min... 0.653770 1.000000 0.849188 0.764160 0.849188 0.841067 0.829853 0.784780
59 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 10, 'min... 0.655187 1.000000 0.807425 0.789229 0.855220 0.839211 0.831245 0.784872
60 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 20, 'min... 0.598343 1.000000 1.000000 1.000000 1.000000 0.535963 0.676411 0.762042
61 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 20, 'min... 0.612851 1.000000 1.000000 1.000000 0.744780 0.684223 0.786234 0.760557
62 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 50, 'min... 0.560065 0.995349 0.997680 0.999071 0.999536 0.879118 0.919258 0.951462
63 DT DecisionTreeClassifier(class_weight=None, crit... {'criterion': 'entropy', 'max_depth': 50, 'min... 0.595632 1.000000 1.000000 1.000000 0.982831 0.538747 0.692498 0.747935
64 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.657479 0.948837 0.935035 0.914578 0.892807 0.850116 0.818871 0.781903
65 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.668522 0.939535 0.941995 0.930362 0.900232 0.857773 0.838360 0.785522
66 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.659826 0.920930 0.904872 0.891365 0.873318 0.851276 0.831709 0.780603
67 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'sqrt', 'min_... 0.671323 0.911628 0.928074 0.930362 0.900696 0.865893 0.840371 0.790719
68 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.635096 0.883721 0.895592 0.887651 0.862645 0.829930 0.804176 0.767796
69 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.669266 0.920930 0.928074 0.931291 0.903944 0.856845 0.835576 0.787193
70 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.647922 0.902326 0.904872 0.891365 0.877958 0.837587 0.809745 0.772807
71 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 5, 'max_features': 'log2', 'min_... 0.664761 0.888372 0.914153 0.916435 0.891879 0.857309 0.828461 0.785336
72 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.643911 0.893023 0.900232 0.881151 0.859861 0.832947 0.814695 0.772993
73 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.682361 0.944186 0.932715 0.917363 0.910905 0.873318 0.845321 0.796473
74 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.654879 0.920930 0.907193 0.886722 0.870070 0.842923 0.822738 0.775592
75 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'sqrt', 'min... 0.685155 0.944186 0.930394 0.914578 0.907193 0.874478 0.844393 0.800093
76 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.642883 0.893023 0.893271 0.870938 0.851972 0.828306 0.807734 0.770302
77 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.681424 0.916279 0.918794 0.903435 0.900232 0.868910 0.843774 0.796845
78 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.652632 0.920930 0.909513 0.878366 0.857541 0.837819 0.819180 0.777633
79 RF (DecisionTreeClassifier(class_weight=None, cri... {'max_depth': 25, 'max_features': 'log2', 'min... 0.678910 0.934884 0.921114 0.903435 0.887239 0.869374 0.840835 0.795360
80 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 1} 0.608332 1.000000 1.000000 1.000000 0.941995 0.633411 0.755607 0.853364
81 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 10} 0.657943 0.953488 0.914153 0.915506 0.903016 0.876566 0.803867 0.793782
82 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME', 'n_estimators': 100} 0.666639 0.953488 0.951276 0.943361 0.913225 0.870766 0.837278 0.784223
83 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 1} 0.608332 1.000000 1.000000 1.000000 0.941995 0.633411 0.755607 0.853364
84 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 10} 0.667361 0.958140 0.944316 0.935933 0.899304 0.872622 0.842537 0.782923
85 AB (DecisionTreeClassifier(class_weight=None, cri... {'algorithm': 'SAMME.R', 'n_estimators': 100} 0.683571 0.958140 0.951276 0.935005 0.913225 0.880510 0.847332 0.798979
86 BAG (DecisionTreeClassifier(class_weight=None, cri... {'n_estimators': 10} 0.631672 1.000000 1.000000 0.700093 0.820882 0.731555 0.804950 0.797494
87 BAG (DecisionTreeClassifier(class_weight=None, cri... {'n_estimators': 100} 0.668784 0.934884 0.902552 0.909006 0.883527 0.855220 0.831245 0.789698

88 rows × 11 columns

In [341]:
writer = pd.ExcelWriter('Model Evaluation.xlsx', engine='xlsxwriter')
table_0.to_excel(writer, sheet_name='Sheet 0')
table_1.to_excel(writer, sheet_name='Sheet 1')
table_2.to_excel(writer, sheet_name='Sheet 2')
writer.save()
In [257]:
results_df = clfs_loop_temporal(train_sets, test_sets, 'fully_funded', models, clfs)
LR
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
KNN
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
DT
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
RF
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
AB
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
BAG
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
LR
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
KNN
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
DT
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
RF
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
AB
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
BAG
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
LR
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
KNN
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
DT
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
RF
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
AB
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
BAG
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>

Phase 5: Report of Best Models Given Different Metrics

In [413]:
import report
In [380]:
metrics=['auc-roc','p_at_1', 'p_at_2', 'p_at_5', 'p_at_10', 'p_at_20', 'p_at_30', 'p_at_50']
tables = [table_0, table_1, table_2]
In [414]:
report.generate_whole_report(tables, metrics)
The best classifiers under different metrics: 

The 0th 6 months:
------------------------------------------------------
auc-roc: max = 0.64, models = ['AB']
p_at_1: max = 1.00, models = ['KNN' 'DT' 'AB' 'BAG']
p_at_2: max = 1.00, models = ['KNN' 'DT' 'AB' 'BAG']
p_at_5: max = 1.00, models = ['DT' 'AB']
p_at_10: max = 1.00, models = ['DT' 'AB']
p_at_20: max = 1.00, models = ['DT' 'AB']
p_at_30: max = 1.00, models = ['DT' 'AB']
p_at_50: max = 1.00, models = ['DT' 'AB']

The 1th 6 months:
------------------------------------------------------
auc-roc: max = 0.67, models = ['RF']
p_at_1: max = 1.00, models = ['KNN' 'DT' 'AB' 'BAG']
p_at_2: max = 1.00, models = ['KNN' 'DT' 'AB' 'BAG']
p_at_5: max = 1.00, models = ['KNN' 'DT' 'AB']
p_at_10: max = 1.00, models = ['KNN' 'DT']
p_at_20: max = 1.00, models = ['KNN']
p_at_30: max = 1.00, models = ['KNN']
p_at_50: max = 1.00, models = ['KNN']

The 2th 6 months:
------------------------------------------------------
auc-roc: max = 0.69, models = ['RF']
p_at_1: max = 1.00, models = ['KNN' 'DT' 'AB' 'BAG']
p_at_2: max = 1.00, models = ['KNN' 'DT' 'AB' 'BAG']
p_at_5: max = 1.00, models = ['KNN' 'DT' 'AB']
p_at_10: max = 1.00, models = ['KNN' 'DT']
p_at_20: max = 1.00, models = ['KNN']
p_at_30: max = 1.00, models = ['KNN']
p_at_50: max = 1.00, models = ['KNN']